Adding Fault-tolerance to State Machine-based Designs
نویسندگان
چکیده
Late detection of new types of faults often results in the evolution of faulttolerance requirements while developers have already created design artifacts. Thus, the reuse of an existing design in the development of a fault-tolerant version thereof has the potential to reduce the overall development costs. Moreover, the automation of such a reuse yields a fault-tolerant design that is correct by construction, given that the existing design is correct. To facilitate such an automation, we present an approach, where we add three levels of faulttolerance, namely failsafe, nonmasking, and masking, to functional designs represented as state machines. Intuitively, failsafe fault-tolerance requires that safety specification is met even in the presence of faults. In the presence of faults, nonmasking fault-tolerance guarantees recovery to states from where safety and liveness specifications are satisfied. Masking fault-tolerance stipulates that (i) recovery is provided to states from where safety and liveness specifications are met, and (ii) safety specification is satisfied during such a recovery. Specifically, we present sound and complete deterministic algorithms for automated addition of (failsafe/nonmasking/masking) fault-tolerance to the functional design of concurrent programs. These polynomial-time algorithms are especially useful in model-driven development of fault-tolerant systems, where models are automatically checked and modified. We also discuss (1) the effect of distribution and safety specification model on the complexity of adding fault-tolerance, and (2) the impact of the proposed algorithms on the addition of multitolerance.
منابع مشابه
Detectors and Correctors: A Theory of Fault-Tolerance Components
A Theory of Fault-Tolerance Components1 Anish Arora Sandeep S. Kulkarni Department of Computer and Information Science The Ohio State University Columbus, Ohio 43210 USA Abstract In this paper, we show that two types of tolerance components, namely detectors and correctors, appear in a rich class of fault-tolerant systems. This class includes systems designed using the wellknown techniques of e...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملReversible Logic Multipliers: Novel Low-cost Parity-Preserving Designs
Reversible logic is one of the new paradigms for power optimization that can be used instead of the current circuits. Moreover, the fault-tolerance capability in the form of error detection or error correction is a vital aspect for current processing systems. In this paper, as the multiplication is an important operation in computing systems, some novel reversible multiplier designs are propose...
متن کاملFault-Tolerance Implementation in Typical Distributed Stream Processing Systems
Typical training simulation systems adopt distributed network architecture designs composed of personal computers because of cost, extensibility, and maintenance considerations. In this design, the functions of the entire system are easily affected by failures or errors from any computer during operation. Thus, adopting appropriate fault-tolerance processing mechanisms to ensure that the normal...
متن کاملAn Evaluation of Shared Multicast Trees with Multiple Active Cores
Core-based multicast trees use less router state, but have significant drawbacks when compared to shortest-path trees, namely higher delay and poor fault tolerance. We evaluate the feasibility of using multiple independent cores within a shared multicast tree. We consider several basic designs and discuss how using multiple cores improves fault tolerance without sacrificing router state. We exa...
متن کامل